Information Presentation Through Ambient Sounds

ABSTRACT

A computer system detects an object in a field-of-view (FOV) using at least one sensor coupled to the computer system and determines a shape of the object using a processor of the computer system. The gaze direction of a user of the computer system is determined using at least one sensor coupled to the system. If the object shape intersecting with the current gaze direction has an associated ambient sound that conveys information about the object, the sound is presented using one or speakers coupled to the computer system.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to U.S. Provisional Application 62/682,424, entitled Display Metaphors, filed Jun. 8, 2018, and to U.S. patent application Ser. No. 16/007,204, entitled Information Display by Overlay on an Object, filed Jun. 13, 2018, both of which are hereby incorporated by reference in their entirety herein for any and all purposes.

BACKGROUND Technical Field

The present subject matter relates to displaying information, and more specifically, to presenting information as an ambient sound associated with an object.

Background Art

Many situations require the presentation information to a user in a way that the user can receive the information when it is needed but is not distracting, confusing or obscures potentially more relevant information. One of many different professions where this is important is for emergency responders where the ability to receive the right information at the right time can be a matter of life or death. Traditionally, emergency responders have relied on audio transmissions over a radio for a majority of their information, but that is changing with the advent of widespread wireless digital communication.

Another new technology that is making its way into the world of emergency responders is digital displays. These displays may be on a handheld device, such as a mobile phone, or on a head-mounted display (HMD), such as a virtual reality (VR) display or an augmented reality (AR) display, which may be integrated into their emergency equipment, such as their helmet. Textual information can be presented to the emergency responder through the display and the information can be updated in real-time through the digital wireless interface from a command center or other information sources.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute part of the specification, illustrate various embodiments. Together with the general description, the drawings serve to explain various principles. In the drawings:

FIG. 1A shows a scene with a user wearing an embodiment of a head-mounted display looking away from an object;

FIG. 1B shows a scene with a user wearing an embodiment of a head-mounted display looking at an object;

FIG. 2 is a is a block diagram of an embodiment of a hybrid reality system;

FIG. 3 is a flowchart of an embodiment of a method for presenting an ambient sound associated with an object; and

FIG. 4 is a flowchart of an embodiment of a method for presenting sound to a user.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent to those skilled in the art that the present teachings may be practiced without such details. In other instances, well known methods, procedures and components have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present concepts. A number of descriptive terms and phrases are used in describing the various embodiments of this disclosure. These descriptive terms and phrases are used to convey a generally agreed upon meaning to those skilled in the art unless a different definition is given in this specification. Some descriptive terms and phrases are presented in the following paragraphs for clarity.

Hybrid Reality (HR), as the phrase is used herein, refers to an image that merges real-world imagery with imagery created in a computer, which is sometimes called virtual imagery. While an HR image can be a still image, it can also be a moving image, such as imagery created using a video stream. HR can be displayed by a traditional two-dimensional display device, such as a computer monitor, one or more projectors, or a smartphone screen. HR imagery can also be displayed by a head-mounted display (HMD). Many different technologies can be used in an HMD to display HR imagery. A virtual reality (VR) HMD system may receive images of a real-world object, objects, or scene, and composite those images with a virtual object, objects, or scene to create an HR image. An augmented reality (AR) HMD system may present a virtual object, objects, or scene on a transparent screen which then naturally mixes the virtual imagery with a view of a scene in the real-world. A display which mixes live video with virtual objects is sometimes denoted AR, but for the purposes of this disclosure, an AR HMD includes at least a portion of the display area that is transparent to allow at least some of the user's view of the real-world to be directly viewed through the transparent portion of the AR HMD. The display used by an HR system represents a scene which is a visible portion of the whole environment. As used herein, the term “scene” and “field of view” (FOV) are used to indicate what is visible to a user.

The word “occlude” is used herein to mean that a pixel of a virtual element is mixed with an image of another object to change the way the object is perceived by a viewer. In a VR HMD, this can be done through use of a compositing process to mix the two images, a Z-buffer technique to remove elements of the image that are hidden from view, a painter's algorithm to render closer objects later in the rendering process, or any other technique that can replace a pixel of the image of the real-world object with a different pixel value generated from any blend of real-world object pixel value and an HR system determined pixel value. In an AR HMD, the virtual object occludes the real-world object if the virtual object is rendered, transparently or opaquely, in the line of sight of the user as they view the real-world object. In the following description, the terms “occlude”, “transparency”, “rendering” and “overlay” are used to denote the mixing or blending of new pixel values with existing object pixel values in an HR display.

In some embodiments of HR systems, there are sensors which provide the information used to render the HR imagery. A sensor may be mounted on or near the display, on the viewer's body, or be remote from the user. Remote sensors may include, but are not limited to, fixed sensors attached in an environment, sensors attached to robotic extensions, sensors attached to autonomous or semi-autonomous drones, or sensors attached to other persons. Data from the sensors may be raw or filtered. Data from the sensors may be transmitted wirelessly or using a wired connection.

Sensors used by some embodiments of HR systems include, but are not limited to, a camera that captures images in the visible spectrum, an infrared depth camera, a microphone, a sound locator, a Hall effect sensor, an air-flow meter, a fuel level sensor, an oxygen sensor, an electronic nose, a gas detector, an anemometer, a mass flow sensor, a Geiger counter, a gyroscope, an infrared temperature sensor, a flame detector, a barometer, a pressure sensor, a pyrometer, a time-of-flight camera, radar, or lidar. Sensors in some HR system embodiments that may be attached to the user include, but are not limited to, a biosensor, a biochip, a heartbeat sensor, a pedometer, a skin resistance detector, or skin temperature detector.

The display technology used by an HR system embodiment may include any method of projecting an image to an eye. Conventional technologies include, but are not limited to, cathode ray tube (CRT), liquid crystal display (LCD), light emitting diode (LED), plasma or organic LED (OLED) screens, or projectors based on those technologies or digital micromirror devices (DMD). It is also contemplated that virtual retina displays, such as direct drawing on the eye's retina using a holographic grating, may be used. It is also contemplated that direct machine to brain interfaces may be used in the future.

The display of an HR system may also be an HMD or a separate device, such as, but not limited to, a hand-held mobile phone, a tablet, a fixed monitor or a TV screen.

The connection technology used by an HR system may include any physical link and associated protocols, such as, but not limited to, wires, transmission lines, solder bumps, near-field connections, infra-red connections, or radio frequency (RF) connections such as cellular, satellite or Wi-Fi® (a registered trademark of the Wi-Fi Alliance). Virtual connections, such as software links, may also be used to connect to external networks and/or external compute.

In many HR embodiments, aural stimuli and information may be provided by a sound system. The sound technology may include monaural, binaural, or multi-channel systems. A binaural system may include a headset or another two-speaker system but may also include systems with more than two speakers directed to the ears. The sounds may be presented as 3D audio, where each sound has a perceived position in space, achieved by using reverberation and head-related transfer functions to mimic how sounds change as they move in a particular space.

In many HR system embodiments, objects in the display may move. The movement may be due to the user moving within the environment, for example walking, crouching, turning, or tilting the head. The movement may be due to an object moving, for example a dog running away, a car coming towards the user, or a person entering the FOV. The movement may also be due to an artificial movement, for example the user moving an object on a display or changing the size of the FOV. In one embodiment, the motion may be due to the user deliberately distorting all or part of the FOV, for example adding a virtual fish-eye lens. In the following description, all motion is considered relative; any motion may be resolved to a motion from a single frame of reference, for example the user's viewpoint.

When there is motion in an HR system, the perspective of any generated object overlay may be corrected so that it changes with the shape and position of the associated real-world object. This may be done with any conventional point-of-view transformation based on the angle of the object from the viewer; note that the transformation is not limited to simple linear or rotational functions, with some embodiments using non-Abelian transformations. It is contemplated that motion effects, for example blur or deliberate edge distortion, may also be added to a generated object overlay.

In some HR embodiments, images from cameras, whether sensitive to one or more of visible, infra-red, or microwave spectra, may be processed before algorithms are executed. Algorithms used after image processing for embodiments disclosed herein may include, but are not limited to, object recognition, motion detection, camera motion and zoom detection, light detection, facial recognition, text recognition, or mapping an unknown environment. The image processing may also use conventional filtering techniques, such as, but not limited to, static, adaptive, linear, non-linear, and Kalman filters. Deep-learning neural networks may be trained in some embodiments to mimic functions which are hard to create algorithmically. Image processing may also be used to prepare the image, for example by reducing noise, restoring the image, edge enhancement, or smoothing.

In some HR embodiments, objects may be detected in the FOV of one or more cameras. Objects may be detected by using conventional algorithms, such as, but not limited to, edge detection, feature detection (for example surface patches, corners and edges), greyscale matching, gradient matching, pose consistency, or database look-up using geometric hashing. Genetic algorithms and trained neural networks using unsupervised learning techniques may also be used in embodiments to detect types of objects, for example people, dogs, or trees.

In embodiments of an HR system, object may be performed on a single frame of a video stream, although techniques using multiple frames are also envisioned. Advanced techniques, such as, but not limited to, Optical Flow, camera motion, and object motion detection may be used between frames to enhance object recognition in each frame.

After object recognition, rendering the object may be done by the HR system embodiment using databases of similar objects, the geometry of the detected object, or how the object is lit, for example specular reflections or bumps.

In some embodiments of an HR system, the locations of objects may be generated from maps and object recognition from sensor data. Mapping data may be generated on the fly using conventional techniques, for example the Simultaneous Location and Mapping (SLAM) algorithm used to estimate locations using Bayesian methods, or extended Kalman filtering which linearizes a non-linear Kalman filter to optimally estimate the mean or covariance of a state (map), or particle filters which use Monte Carlo methods to estimate hidden states (map). The locations of objects may also be determined a priori, using techniques such as, but not limited to, reading blueprints, reading maps, receiving GPS locations, receiving relative positions to a known point (such as a cell tower, access point, or other person) determined using depth sensors, WiFi time-of-flight, or triangulation to at least three other points.

Gyroscope sensors on or near the HMD may be used in some embodiments to determine head position and to generate relative motion vectors which can be used to estimate location.

In embodiments of an HR system, sound data from one or microphones may be processed to detect specific sounds. Sounds that might be identified include, but are not limited to, human voices, glass breaking, human screams, gunshots, explosions, door slams, or a sound pattern a particular machine makes when defective. Gaussian Mixture Models and Hidden Markov Models may be used to generate statistical classifiers that are combined and looked up in a database of sound models. One advantage of using statistical classifiers is that sounds can be detected more consistently in noisy environments.

In some embodiments of an HR system, eye tracking of one or both viewer's eyes may be performed. Eye tracking may be used to measure the point of the viewer's gaze. In an HMD, the position of each eye is known, and so there is a reference frame for determining head-to-eye angles, and so the position and rotation of each eye can be used to estimate the gaze point. Eye position determination may be done using any suitable technique and/or device, including, but not limited to, devices attached to an eye, tracking the eye position using infra-red reflections, for example Purkinje images, or using the electric potential of the eye detected by electrodes placed near the eye which uses the electrical field generated by an eye independently of whether the eye is closed or not.

Turning now to the current disclosure, systems that display HR imagery are becoming increasingly common and are making their way from entertainment and gaming into industrial and commercial applications. Examples of systems that may find HR imagery useful include aiding a person doing a task, for example repairing machinery, testing a system, or responding to an emergency.

Many of the same environments where HR imagery might be used also provide information to a user. This information may be associated with real objects in the environment or may be related to the environment as a whole, for example an ambient or average value. In other cases, the information to be provided to the user is unrelated to the real environment they are working in. Providing the various types of information to the user in a way that can be readily understood by the user and is not confusing, distracting or obscuring details that the user needs can be a challenge.

In an HR system which aids a person doing a task, for example repairing machinery, testing a system, or responding to an emergency, it is often critical to present information to the user. Traditionally, speech and/or textual information have been the primary ways to provide information to a user. While those modes of information delivery have advantages in the amount of detail that they can provide and the wide range of information that they can convey, understanding detailed speech or textual information diverts attention and takes concentration away from the task at hand, which can be dangerous. Even the presentation of simple, basic information can easily escalate when there are many instances of such information to be provided by the HR system.

Using HR technology, information can be presented to a user using sounds presented in a non-intrusive and natural way. In particular, sound information can be mixed with the current sound presentation as ambient or background noise, which adds information without being distracting. Further, the HR system has capabilities that can determine which object is being viewed, and in many situations can determine characteristics of the object currently being viewed. By combining these features, simple information can be presented to the user without interfering with the operation of the HR system by having to remove or obscure any part of the information currently being delivered.

In an example embodiment of an HR system, gaze detection hardware may determine the gaze direction of a user. By using object recognition techniques from a video feed, one or more objects in the current field-of-view can be detected, including the boundary edges or convex hull of an object or group of objects. By considering the internal intersection of the gaze direction and object boundaries, the HR system can determine which object is currently being viewed. Note that selecting a current object over periods of time, for example using different frames of video, can also offer other pertinent information, such as: a time that a user starts looking at an object; a time that a user stops looking at an object; and how long the user looks at an object.

Note that detecting an object does not have to be performed using a current video feed of the field-of-view. An object may be detected using other methods, such as, but not limited to, blueprints, maps, or object positions determined by other actors for example other personnel, drones or fixed sensors in the environment. In some cases, the selected object may not be visible to the user, perhaps being obscured by a physical barrier, smoke, or lack of illumination.

Once an object has been selected by the HR system as being currently being viewed, information related to the object is determined. The information may be any pertinent information, such as, but not limited to, a type or class (e.g. vehicle, building, wall, hydrant, or person), a physical property which may have been received from an additional sensor coupled to the HR system (e.g. temperature, pressure, mass, or velocity), a proximal hazard, an identity of something near, behind or inside the object (e.g. water pressure, amount of fuel, or number of passengers), or a safe path to the object. Note that the information is often not directly visible to the user.

The characteristic information can be used by the HR system to choose an associated sound. The HR system may start the sound playing when the user starts to look at the object. In one example, the sound may be an excerpt which plays for a predetermined time which may vary according to the characteristic. In another example, the sound continues to play until the user stops looking at the object, or offers a specific gesture, such as, but not limited to, blinking, operating a menu or quickly glancing out and then back to the object.

Some example ambient sounds and the associated information being contemplated are listed in Table 1. These examples are non-limiting and any particular sound could be associated with any information depending on the embodiment.

TABLE 1 Sound Information Sea Liquid River Water Running Faucet Liquid that can extinguish a fire Fire crackling Fire Bacon sizzling Hot Storm Danger Lightning crack Live electricity Blizzard Cold Shivering person Potential for ice Growling wolf Danger Engine sound Approaching vehicle Heart-beat Hidden person near/behind object “Hello” Hidden person near/behind object

The HR system may make the sound non-intrusive when played by mixing the sound to be in the background without degrading the current audio feed to the user. In one example, the sound is presented in a binaural system as having no apparent origin, thus emphasizing the background nature. In another example, the sound is presented in a binaural system has having a position in the 3D audio landscape, for example coming from the object, thus reducing any potential for distraction.

In some embodiments, the sound may be music or music. Since the reaction to certain sounds and music is largely a personal experience, it is contemplated that the sound and information association may be customized by the user—this may be done using any conventional technique such as, but limited to, a menu, a wizard, or importing a configuration. This allows the user to select a sound or music that has to have an immediate, significant and natural meaning to them

Reference now is made in detail to the examples illustrated in the accompanying drawings and discussed below.

FIG. 1A shows a user 100 wearing an embodiment of a head-mounted system 130. As indicated by the schematic eye 110A, user 100 is looking away from fire hydrant 120 at the time shown in FIG. 1A. Embodiments of a head-mounted system 130 may include a visible camera 132, infra-red depth camera 134 and an internal infra-red eye gaze tracking system 136. In embodiments the eye gaze tracking system 136 may be located inside of the head-mounted system 130 and may not be visible to an external observer. Other embodiments may have different sensors, such as any sensor of combination of sensors described above. The head-mounted system 130 receives data from sensors 132, 134 and determines that fire hydrant 120 is an object in the FOV. Head-mounted system 130 receives data from eye gaze tracking system 136 and determines that the gaze position 111 does not intersect the fire hydrant 120 and so no ambient sound is added to sound subsystem 138.

FIG. 1B shows the user 100 wearing the embodiment of the head-mounted system 130. As indicated by the schematic eye 112B, user 100 is looking at fire hydrant 120 at the time shown in FIG. 1B. Similarly to the vignette shown in FIG. 1A, the head-mounted system 130 receives data from sensors 132, 134 and determines that fire hydrant 120 is an object in the FOV. Head-mounted system 130 receives data from the eye gaze tracking system 136 and determines that the gaze position 113 intersects the fire hydrant 120 and so an associated ambient sound is played for the user 100 by sound subsystem 138, such as, but not limited to, a music excerpt selected by the user, the sound of a river, the sound of a faucet running, or the sound of the sea.

FIG. 2 is a block diagram of an embodiment of an HR system 200 which may have some components implemented as part of a head-mounted assembly. The HR system 200 may be considered a computer system that can be adapted to be worn on the head, carried by hand, or otherwise attached to a user. In the embodiment of the HR system 200 shown, a structure 205 is included which is adapted to be worn on the head of a user. The structure 205 may include straps, a helmet, a hat, or any other type of mechanism to hold the HR system on the head of the user as an HMD.

The HR system 200 also includes a display 250 coupled to position the display 250 in a field-of-view (FOV) of the user. The structure 205 may position the display 250 in a field of view of the user. In some embodiments, the display 250 may be a stereoscopic display with two separate views of the FOV, such as view 252 for the user's left eye, and view 254 for the user's right eye. The two views 252, 254 may be shown as two images on a single display device or may be shown using separate display devices that are included in the display 250. In some embodiments, the display 250 may be transparent, such as in an augmented reality (AR) HMD. In systems where the display 250 is transparent, the view of the FOV of the real-world as seen through the display 250 by the user is composited with virtual objects that are shown on the display 250. The virtual objects may occlude real objects in the FOV as overlay elements and may themselves be transparent or opaque, depending on the technology used for the display 250 and the rendering of the virtual object. A virtual object, such as an overlay element, may be positioned in a virtual space that could be two-dimensional or three-dimensional, depending on the embodiment, to be in the same position as an associated real object in real space. Note that if the display 250 is a stereoscopic display, two different views of the overlay element may be rendered and shown in two different relative positions on the two views 252, 254, depending on the disparity as defined by the inter-ocular distance of a viewer.

In some embodiments, the HR system 200 includes one or more sensors in a sensing block 240 to sense at least a portion of the FOV of the user by gathering the appropriate information for that sensor, for example visible light from a visible light camera, from the FOV of the user. Any number of any type of sensor, including sensors described previously herein, may be included in the sensor block 240, depending on the embodiment. In the embodiment shown, the sensor block 240 includes an eye gaze detection subsystem 242.

The HR system 200 may also include an I/O block 220 to allow communication with external devices. The I/O block 220 may include one or both of a wireless network adapter 222 coupled to an antenna 224 and a network adapter 226 coupled to a wired connection 228. The wired connection 228 may be plugged into a portable device, for example a mobile phone, or may be a component of an umbilical system such as used in extreme environments.

In some embodiments, the HR system 200 includes a sound processor 260 which takes input from one or microphones 262. In some HR systems 200, the microphones 262 may be attached to the user. External microphones, for example attached to an autonomous drone, may send sound data samples through wireless or wired connections to I/O block 220 instead of, or in addition to, the sound data received from the microphones 262. The sound processor 260 may generate sound data which is transferred to one or more speakers 264, which are a type of sound reproduction device. The generated sound data may be analog samples or digital values. If more than one speaker 264 is used, the sound processor may generate or simulate 2D sound placement. In some HR systems 200, a first speaker may be positioned to provide sound to the left ear of the user and a second speaker may be positioned to provide sound to the right ear of the user. Together, the first speaker and the second speaker may provide binaural sound to the user.

In some embodiments, the HR system 200 includes a stimulus block 270. The stimulus block 270 is used to provide other stimuli to expand the HR system user experience. Embodiments may include numerous haptic pads attached to the user that provide a touch stimulus. Embodiments may also include other stimuli, such as, but not limited to, changing the temperature of a glove, changing the moisture level or breathability of a suit, or adding smells to a breathing system.

The HR system 200 may include a processor 210 and one or more memory devices 230, which may also be referred to as a tangible medium or a computer readable medium. The processor 210 is coupled to the display 250, the sensing block 240, the memory 230, I/O block 220, sound block 260, and stimulus block 270, and is configured to execute the instructions 232 encoded on (i.e. stored in) the memory 230. Thus, the HR system 200 may include an article of manufacture comprising a tangible medium 230, that is not a transitory propagating signal, encoding computer-readable instructions 232 that, when applied to a computer system 200, instruct the computer system 200 to perform one or more methods described herein, thereby configuring the processor 210.

While the processor 210 included in the HR system 200 may be able to perform methods described herein autonomously, in some embodiments, processing facilities outside of that provided by the processor 210 included inside of the HR system 200 may be used to perform one or more elements of methods described herein. In one non-limiting example, the processor 210 may receive information from one or more of the sensors 240 and send that information through the wireless network adapter 222 to an external processor, such as a cloud processing system or an external server. The external processor may then process the sensor information to identify an object in the FOV and send information about the object, such as its shape and location in the FOV, to the processor 210 through the wireless network adapter 222.

In some embodiments, the instructions 232 may instruct the HR system 200 to detect an object in a field-of-view (FOV) using at least one sensor 240 coupled to the computer system 200 and establish a first boundary of the object. The instructions 232 may further instruct the HR system 200 to determine the boundary of a second or other objects in the field of view.

The instructions 232 may further instruct the HR system 200 to determine an eye gaze direction using at least one sensor 240, such as the eye gaze detection subsystem 242, coupled to the computer system 200.

The instructions 232 may further instruct the HR system 200 to determine whether the eye gaze direction intersects, or is within, the first object boundary and, if within the boundary, determine whether there is an associated ambient sound with that object. In one non-limiting example, the instructions 232 instruct the HR system 200 to determine the object type within the boundary and use the object type as an index to lookup in a table of associated ambient sounds. If an association is present, the ambient sound may be mixed with other sound output by sound processor 260 and sent to speakers 264 to play the ambient sound to the user.

In at least one embodiment, the processor 210 may be configured to detect a gaze direction of an eye of the wearer of the HR system 200 using eye gaze detection subsystem and to select an object based on the gaze direction. The processor then obtains information related to the object and chooses a sound based on the information. A digital representation of the sound is then rendered to the user through at least one sound reproduction device, such as the speaker 264. In at least one embodiment, the HR system 200 includes a head-mounted display (HMD) with a transparent portion so that the user can see a real-world object through the transparent portion of the display 250. In such embodiments, the processor 210 may be further configured to receive sensor data related to the object from the sensor 240 and determine positions of one or more items based on the sensor data, the one or more items including the object. The processor may also be configured to select the object from the one or more items based on the determined positions of the one or more items and the gaze direction.

Aspects of various embodiments are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, systems, and computer program products according to various embodiments disclosed herein. It will be understood that various blocks of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks. The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and/or block diagrams in the figures help to illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products of various embodiments. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

FIG. 2 is a flowchart 200 of an embodiment of a method for presenting an ambient sound associated with an object. After the flowchart 200 starts at box 201, the user's gaze direction is determined at input box 202. In decision box 204, if the gaze is on an object that has an associated ambient sound, the flow moves to decision box 206; if the gaze is not on an object with an associated ambient sound, the flows returns to input box 202. In decision box 206, if the associated ambient sound is an excerpt, the flow moves to process box 210 which plays the excerpt for a pre-determined period; if the associated ambient sound is not an excerpt, the flow moves to process box 216 which starts playing an ambient sound.

After the excerpt finishes in process box 210, the user's gaze direction is determined at input box 211. In the decision box 212, if the user's gaze is still on the object, the gaze direction is repeatedly determined at input box 211 until the gaze is not in the object, when the flow returns to the start box 201.

After the associated ambient sound is started in process box 216, the user's gaze direction is determined at input box 217. At decision box 218, if the user's gaze is still on the object, the gaze direction is repeatedly determined at input box 217 until the gaze is not in the object, when the flow returns to the start box 201.

It is contemplated that the excerpt or ambient sound played may change while playing for any reason, for example a changing status of the object currently located in the eye gaze direction.

FIG. 4 is a flowchart 400 of an embodiment of a method for presenting sound to a user. The method starts 401 and a gaze direction of the user's eye is detected 402 using gaze detection hardware integrated into a head-mounted display (HMD) in some embodiments. The gaze detection hardware may determine the gaze direction using one eye or both eyes of the user, depending on the embodiment. The flowchart 400 continues by selecting 403 an object based on the gaze direction. In some embodiments, a camera or other sensor may capture an image of a field-of-view (FOV) of the user and detect one or more objects in the image. A boundary for an object or group of objects may be defined and the gaze direction used to project a vector from the eye of the user to determine if the gaze direction intersects with a boundary of an object. If an intersection occurs, the object may be selected.

While the object may be visible to the user from their current position, in some embodiments the object may be hidden from view of the user. The location of the object may be determined using sensors that are not located on or near the user or the position of the object may be known from maps, blueprints, a radio-frequency beacon transmitted by the object, or any other suitable technique.

In some embodiments, the method may show the object to the user on a display and use a position of the display relative to the user's eye to detect the gaze position of the user's eye. The object may be a real-world object or a virtual object, depending on the embodiment. If the object is a real-world object, it may be shown to the user on the display using transmitting light reflected by, or generated by, a real object through a transparent portion of the display to the user's eye.

The flowchart 400 continues with obtaining 404 information related to the object. In some embodiments, the information related to the object may include a type or class of the object. In some embodiments, the information may be based on a physical property of the object, such as a temperature, a pressure, a state of matter, a mass, or a velocity.

In some embodiments, the physical property of the object may be sensed 442 using a hardware sensor and the sensor data based on that sensing received 443 and used to determine 444 the information related to the object based on the sensor data. In some embodiments, the information may include a hazard related to the object, an identity of something near the object, an identity of something behind the object, an identity of something inside of the object, or an indication of a safe path to the object.

A sound is then chosen 405 based on the information and the sound may be selected to convey the information to the user. The sound can include music or any other type of sound. In some embodiments, an association between the information and the sound is based on a setting provided by the user. In some embodiments, the information being conveyed by the sound is not visible to the user's eye by viewing the object.

A digital representation of the sound is retrieved 406. The digital representation may be in any form, compressed (lossy or lossless), or uncompressed, and encoded in any format, including, but not limited to, a MP3 file, pulse-code modulated data, or advanced audio codec (AAC) data. The digital representation may be retrieved from any available location, including, but not limited to, local memory, an optical disc, or a remote server.

The flowchart 400 continues with rendering 407 the digital representation of the sound to the user. If the digital representation is retrieved over a network connection, the digital representation may be downloaded and stored locally before it is rendered, or it may be streamed from the remote source in real-time as it is rendered.

In some embodiments, the digital representation of the sound continues to be rendered, or played, for a predetermined period of time. In other embodiments, the rendering continues for a period of time determined based on the information.

In some embodiments, the start and or stop of the rendering may be controlled by the user. A predetermined eye gesture may be detected 472 and, depending on the context and the type of gesture, the rendering may be started or stopped 474 based on a predetermined eye gesture performed by the user. Eye gestures that may be used include, but are not limited to, a change in the gaze direction of the user's eye from a first position pointing away from the object to a second position pointing at the object, an eye blink by the user while the gaze direction of the user's eye is pointing at the object, or a change in the gaze direction of the user's eye from the second position pointing at the object to a third position pointing away from the object.

The rendering of the digital representation of the sound may be performed to make the sound non-intrusive to the user. This may be accomplished in many different ways, such as mixing the sound to be a background sound with other sounds or presenting the sound to the user as a non-directional sound. In some embodiments the sound may be presented to the user as a directional sound originating at the object.

Embodiments may be useful in a variety of applications and in a variety of environments. Non-limiting examples of environments where embodiments may be used are described below.

One example application of an embodiment is a virtual guide dog for a visually impaired, but not completely blind, individual. An HR HMD may be used as a virtual guide dog and as the user looks towards an object, the sound played can provide additional information about the object that the visually impaired user may not be able to discern using only their eyes. For example, if there is an object which is very hot, a sizzling sound may be played in response to the visually impaired user looking toward the object, making it clear that there is potential danger even though the object cannot be recognized.

An environment where embodiments may be useful is immersive entertainment venues where there are multiple performances occurring simultaneously throughout the venue. Several different performances may be visible to a user from a single vantage point, making it difficult to discern what sounds are coming from which performance. An embodiment may be used to detect which performance the user is gazing at, and then emphasizing the sound from that performance. In another example, embodiments may be used at an orchestra concert to allow the sound from a particular performer or instrument to be emphasized to the user of the embodiment. In some cases, this emphasized sound may not be enabled until the user performs an eye gesture to start it, such as staring that the performer for longer than a predetermined period. Note that emphasizing the sound may be done by diminishing sounds based on a direction but may also be done by selecting audio tracks that are delivered to the HR system from, for example, a centralized console or mixing desk.

In another example of use, a police officer may utilize an embodiment during their patrol. As the police officer moves through their environment, different automobiles may be automatically identified from their license plate and information about the car, such as whether or not it is stolen, illegally parked, or its tags are expired, may be provided by different ambient sounds played for the police officer. For example, a siren sound may be played if the car is stolen, an alarm clock sound played if the tags are expired, and an excerpt of music in a minor key played if the car is illegally parked.

As will be appreciated by those of ordinary skill in the art, aspects of the various embodiments may be embodied as a system, device, method, or computer program product apparatus. Accordingly, elements of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, or the like) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “server,” “circuit,” “module,” “client,” “computer,” “logic,” or “system,” or other terms. Furthermore, aspects of the various embodiments may take the form of a computer program product embodied in one or more computer-readable medium(s) having computer program code stored thereon.

Any combination of one or more computer-readable storage medium(s) may be utilized. A computer-readable storage medium may be embodied as, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or other like storage devices known to those of ordinary skill in the art, or any suitable combination of computer-readable storage mediums described herein. In the context of this document, a computer-readable storage medium may be any tangible medium that can contain, or store a program and/or data for use by or in connection with an instruction execution system, apparatus, or device. Even if the data in the computer-readable storage medium requires action to maintain the storage of data, such as in a traditional semiconductor-based dynamic random access memory, the data storage in a computer-readable storage medium can be considered to be non-transitory. A computer data transmission medium, such as a transmission line, a coaxial cable, a radio-frequency carrier, and the like, may also be able to store data, although any data storage in a data transmission medium can be said to be transitory storage. Nonetheless, a computer-readable storage medium, as the term is used herein, does not include a computer data transmission medium.

Computer program code for carrying out operations for aspects of various embodiments may be written in any combination of one or more programming languages, including object oriented programming languages such as Java, Python, C++, or the like, conventional procedural programming languages, such as the “C” programming language or similar programming languages, or low-level computer languages, such as assembly language or microcode. The computer program code if loaded onto a computer, or other programmable apparatus, produces a computer implemented method. The instructions which execute on the computer or other programmable apparatus may provide the mechanism for implementing some or all of the functions/acts specified in the flowchart and/or block diagram block or blocks. In accordance with various implementations, the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server, such as a cloud-based server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). The computer program code stored in/on (i.e. embodied therewith) the non-transitory computer-readable medium produces an article of manufacture.

The computer program code, if executed by a processor causes physical changes in the electronic devices of the processor which change the physical flow of electrons through the devices. This alters the connections between devices which changes the functionality of the circuit. For example, if two transistors in a processor are wired to perform a multiplexing operation under control of the computer program code, if a first computer instruction is executed, electrons from a first source flow through the first transistor to a destination, but if a different computer instruction is executed, electrons from the first source are blocked from reaching the destination, but electrons from a second source are allowed to flow through the second transistor to the destination. So a processor programmed to perform a task is transformed from what the processor was before being programmed to perform that task, much like a physical plumbing system with different valves can be controlled to change the physical flow of a fluid.

Unless otherwise indicated, all numbers expressing quantities, properties, measurements, and so forth, used in the specification and claims are to be understood as being modified in all instances by the term “about.” The recitation of numerical ranges by endpoints includes all numbers subsumed within that range, including the endpoints (e.g. 1 to 5 includes 1, 2.78, π, 3.33, 4, and 5).

As used in this specification and the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the content clearly dictates otherwise. Furthermore, as used in this specification and the appended claims, the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise. As used herein, the term “coupled” includes direct and indirect connections. Moreover, where first and second devices are coupled, intervening devices including active devices may be located there between.

The description of the various embodiments provided above is illustrative in nature and is not intended to limit this disclosure, its application, or uses. Thus, different variations beyond those described herein are intended to be within the scope of embodiments. Such variations are not to be regarded as a departure from the intended scope of this disclosure. As such, the breadth and scope of the present disclosure should not be limited by the above-described exemplary embodiments, but should be defined only in accordance with the following claims and equivalents thereof. 

What is claimed is:
 1. A method for providing sound to a user, the method comprising: detecting a gaze direction of an eye of the user using gaze detection hardware; selecting an object based on the gaze direction; obtaining information related to the object; choosing a sound based on the information; retrieving a digital representation of the sound; and rendering the digital representation of the sound to the user.
 2. The method of claim 1, wherein the object is hidden from view of the user.
 3. The method of claim 1, wherein the information is a type or class of the object.
 4. The method of claim 1, wherein the information is based on a physical property of the object.
 5. The method of claim 4, further comprising: sensing the physical property of the object using a hardware sensor; receiving sensor data based on said sensing; and determining the information related to the object based on the sensor data.
 6. The method of claim 1, wherein the information comprises a hazard related to the object, an identity of something near the object, an identity of something behind the object, an identity of something inside of the object, or an indication of a safe path to the object.
 7. The method of claim 1, further comprising: detecting a predetermined eye gesture performed by the user; and starting said rendering of the digital representation of the sound in response to said detection of the first predetermined eye gesture.
 8. The method of claim 7, wherein the predetermined eye gesture comprises a change in the gaze direction of the eye of the user from a first position pointing away from the object to a second position pointing at the object.
 9. The method of claim 1, further comprising: detecting a predetermined eye gesture performed by the user; and halting said rendering of the digital representation of the sound in response to said detection of the second predetermined eye gesture.
 10. The method of claim 9, wherein the predetermined eye gesture comprises a change in the gaze direction of the eye of the user from a second position pointing at the object to a third position pointing away from the object.
 11. The method of claim 1, wherein said rendering of the digital representation of the sound is performed to make the sound non-intrusive to the user.
 12. The method of claim 11, wherein said non-intrusive rendering of the digital representation of the sound is mixed to be a background sound with other sounds.
 13. The method of claim 1, wherein said rendering of the digital representation of the sound is presented to the user as a directional sound originating at the object.
 14. The method of claim 1, wherein an association between the information and the sound is based on a setting provided by the user.
 15. The method of claim 1, wherein the sound is selected to convey the information to the user.
 16. The method of claim 15, wherein the information is not visible to the eye of the user by viewing the object.
 17. An article of manufacture comprising a tangible medium, that is not a transitory propagating signal, encoding computer-readable instructions that, when applied to a computer system, instruct the computer system to perform a method comprising: detecting a gaze direction of an eye of a user using gaze detection hardware; establishing a position of an object in a field of view of the user using a sensor; defining a boundary around the object; determining that the gaze direction intersects the boundary around the object; obtaining information associated with the object; choosing a sound based on the information; and playing the sound to the user.
 18. The article of manufacture of claim 17, the method further comprising: ascertaining a type of the object based an image of the object obtained from the sensor, wherein the information associated with the object comprises the type of the object; using the type of the object as an index into a table of sounds associated with types of objects.
 19. A head-mounted display (HMD) comprising: a display; a structure, coupled to the display and adapted to position the display in a field-of-view (FOV) of the user; an eye gaze detection subsystem, coupled to the structure; a sound reproduction device, coupled to the structure; and a processor, coupled to the display, the eye gaze detection subsystem, and the sound reproduction device, the processor configured to: detect a gaze direction of an eye of a wearer of the HMD using the eye gaze detection subsystem; select an object based on the gaze direction; obtain information related to the object; choose a sound based on the information; retrieve a digital representation of the sound; and render the digital representation of the sound to the user through the at least one sound reproduction device.
 20. The HMD of claim 19, further comprising a sensor, coupled to the processor; the processor further configured to: receive sensor data related to the object from the sensor; and determine the information related to the object based on the sensor data. 